AITopics | unseen attack

Collaborating Authors

unseen attack

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

8606f35ec6c77858dfb80a385d0d1151-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 16:37:25 GMT

adversarial image, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.05)
North America > United States > Maryland (0.05)
Atlantic Ocean > North Atlantic Ocean > Chesapeake Bay (0.05)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

SupplementaryMaterial: DualManifoldAdversarialRobustness: Defense againstLpandnon-LpAdversarialAttacks AOM-ImageNetDetails

Neural Information Processing SystemsFeb-7-2026, 19:55:29 GMT

As pre-processing, each image was center-cropped to produce a square image, and convertedto256 256resolution. In Figure 1, we presentxi (Original) andg(wi)(Projected). Figure 1: Visual comparison between original images and projected images. Weuse the SGD optimizer with the cyclic learning rate scheduling strategyin[10](see Figure 2), momentum0.9,andweightdecay5 For the unseen attacks proposed in [11], we consider attack parameters presented in Table 3. We study how different choices affect the robustness of the trained networks against unseen attacks.

artificial intelligence, iter, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
Asia > Middle East > Jordan (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

Neural Information Processing SystemsDec-27-2025, 10:57:48 GMT

Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39\% (+4.01\%) on CIFAR-10, 56.25\% (+3.13\%) on CIFAR-100, and 82.62\% (+4.93\%) on GTSRB (German Traffic Sign Recognition Benchmark).

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.66)

Add feedback

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

Dabas, Mahavir, Huynh, Tran, Billa, Nikhil Reddy, Wang, Jiachen T., Gao, Peng, Peris, Charith, Ma, Yao, Gupta, Rahul, Jin, Ming, Mittal, Prateek, Jia, Ruoxi

arXiv.org Artificial IntelligenceNov-4-2025

Large language models remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Defending against novel jailbreaks represents a critical challenge in AI safety. Adversarial training -- designed to make models robust against worst-case perturbations -- has been the dominant paradigm for adversarial robustness. However, due to optimization challenges and difficulties in defining realistic threat models, adversarial training methods often fail on newly developed jailbreaks in practice. This paper proposes a new paradigm for improving robustness against unseen jailbreaks, centered on the Adversarial Déjà Vu hypothesis: novel jailbreaks are not fundamentally new, but largely recombinations of adversarial skills from previous attacks. We study this hypothesis through a large-scale analysis of 32 attack papers published over two years. Using an automated pipeline, we extract and compress adversarial skills into a sparse dictionary of primitives, with LLMs generating human-readable descriptions. Our analysis reveals that unseen attacks can be effectively explained as sparse compositions of earlier skills, with explanatory power increasing monotonically as skill coverage grows. Guided by this insight, we introduce Adversarial Skill Compositional Training (ASCoT), which trains on diverse compositions of skill primitives rather than isolated attack instances. ASCoT substantially improves robustness to unseen attacks, including multi-turn jailbreaks, while maintaining low over-refusal rates. We also demonstrate that expanding adversarial skill coverage, not just data scale, is key to defending against novel attacks. \textcolor{red}{\textbf{Warning: This paper contains content that may be harmful or offensive in nature.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.2191

Country:

North America > United States > Virginia (0.04)
Africa > Kenya (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

Dual Manifold Adversarial Robustness: Defense against L p and non-L p Adversarial Attacks A OM-ImageNet Details A.1 Overview

Neural Information Processing SystemsOct-2-2025, 11:36:16 GMT

Figure 1: Visual comparison between original images and projected images. All the classification models are trained using two P6000 GPUs with a batch size of 64 for 20 epochs. We study how different choices affect the robustness of the trained networks against unseen attacks. Table 4: Classification accuracy against unseen attacks applied to OM-ImageNet test set. Table 5. 3 Table 5: Classification accuracy against known (PGD-50 and OM-PGD-50) and unseen attacks Brighter colors indicate larger absolute differences.

artificial intelligence, machine learning, unseen attack, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.05)
Asia > Middle East > Jordan (0.05)

Industry:

Information Technology > Security & Privacy (0.51)
Government > Military (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

In this work, we consider the scenario when the 1 manifold information is exact and show that this information can be very useful for improving robustness to novel

Neural Information Processing SystemsOct-2-2025, 11:33:24 GMT

How DMA T can be exploited for standard tasks/datasets? PGD should not be viewed as the strongest attack for evaluation. Results are shown in Table B. Results are presented in the last column of Table B. DMA T Other strong baselines such as TRADES should be included in the main paper . The notion of "manifold" should be clarified. We will explain this further in the paper.

artificial intelligence, information, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robustness Feature Adapter for Efficient Adversarial Training

Wu, Quanwei, Guo, Jun, Wang, Wei, Wang, Yi

arXiv.org Artificial IntelligenceAug-26-2025

Adversarial training (AT) with projected gradient descent is the most popular method to improve model robustness under adversarial attacks. However, computational overheads become prohibitively large when AT is applied to large backbone models. AT is also known to have the issue of robust overfitting. This paper contributes to solving both problems simultaneously towards building more trustworthy foundation models. In particular, we propose a new adapter-based approach for efficient AT directly in the feature space. We show that the proposed adapter-based approach can improve the inner-loop convergence quality by eliminating robust overfitting. As a result, it significantly increases computational efficiency and improves model accuracy by generalizing adversarial robustness to unseen attacks. We demonstrate the effectiveness of the new adapter-based approach in different backbone architectures and in AT at scale.

artificial intelligence, machine learning, robustness, (17 more...)

arXiv.org Artificial Intelligence

2508.1768

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Class Disentanglement results with different

Neural Information Processing SystemsAug-15-2025, 15:54:44 GMT

Our method performs class-disentanglement by solving a simple unconstrained optimization in Eq.

adversarial image, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

Neural Information Processing SystemsMay-27-2025, 19:41:25 GMT

Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39\% ( 4.01\%) on CIFAR-10, 56.25\% ( 3.13\%) on CIFAR-100, and 82.62\% ( 4.93\%) on GTSRB (German Traffic Sign Recognition Benchmark).

adversarial defense, causaldiff, causality-inspired disentanglement, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unseen Attack Detection in Software-Defined Networking Using a BERT-Based Large Language Model

Swileh, Mohammed N., Zhang, Shengli

arXiv.org Artificial IntelligenceDec-9-2024

Software defined networking (SDN) represents a transformative shift in network architecture by decoupling the control plane from the data plane, enabling centralized and flexible management of network resources. However, this architectural shift introduces significant security challenges, as SDN's centralized control becomes an attractive target for various types of attacks. While current research has yielded valuable insights into attack detection in SDN, critical gaps remain. Addressing challenges in feature selection, broadening the scope beyond DDoS attacks, strengthening attack decisions based on multi flow analysis, and building models capable of detecting unseen attacks that they have not been explicitly trained on are essential steps toward advancing security in SDN. In this paper, we introduce a novel approach that leverages Natural Language Processing (NLP) and the pre trained BERT base model to enhance attack detection in SDN. Our approach transforms network flow data into a format interpretable by language models, allowing BERT to capture intricate patterns and relationships within network traffic. By using Random Forest for feature selection, we optimize model performance and reduce computational overhead, ensuring accurate detection. Attack decisions are made based on several flows, providing stronger and more reliable detection of malicious traffic. Furthermore, our approach is specifically designed to detect previously unseen attacks, offering a solution for identifying threats that the model was not explicitly trained on. To rigorously evaluate our approach, we conducted experiments in two scenarios: one focused on detecting known attacks, achieving 99.96% accuracy, and another on detecting unseen attacks, where our model achieved 99.96% accuracy, demonstrating the robustness of our approach in detecting evolving threats to improve the security of SDN networks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.06239

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: